Background

This file is designed to use CDC data to assess coronavirus disease burden by state, including creating and analyzing state-level cluters.

Through March 7, 2021, The COVID Tracking Project collected and integrated data on tests, cases, hospitalizations, deaths, and the like by state and date. The latest code for using this data is available in Coronavirus_Statistics_CTP_v004.Rmd.

The COVID Tracking Project suggest that US federal data sources are now sufficiently robust to be used for analyses that previously relied on COVID Tracking Project. This code is an attempt to update modules in Coronavirus_Statistics_CTP_v004.Rmd to leverage US federal data.

The code in this module builds on code available in _v001, and splits many functions in to two main .R files that can be sourced:

Broadly, the CDC data analyzed by this module includes:

Functions and Mapping Files

The tidyverse package is loaded and functions are sourced:

# The tidyverse functions are routinely used without package::function format
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.1.1     v dplyr   1.0.6
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
# Functions are available in source file
source("./Generic_Added_Utility_Functions_202105_v001.R")
source("./Coronavirus_CDC_Daily_Functions_v001.R")

A series of mapping files are also available to allow for parameterized processing. Mappings include:

These default parameters are maintained in a separate .R file and can be sourced:

source("./Coronavirus_CDC_Daily_Default_Mappings_v002.R")

Additionally, a mapping file could be maintained to give default plotting labels to variables. This is currently not used by any of the sourced functions:

# Create a variable mapping file - this is currently redundant
varMapper <- c()

Example for Comparison to Previous

Code from the previous model is run, with results compared to previous results:

readList <- list("cdcDaily"="./RInputFiles/Coronavirus/CDC_dc_downloaded_210502.csv", 
                 "cdcHosp"="./RInputFiles/Coronavirus/CDC_h_downloaded_210509.csv"
                 )

cdc_daily_compare <- readRunCDCDaily(thruLabel="May 2, 2021", 
                                     readFrom=readList, 
                                     compareFile=list("cdcDaily"=colRenamer(readFromRDS("dfRaw_dc_210414"),
                                                                            c('new_case'='new_cases', 
                                                                              'tot_death'='tot_deaths',
                                                                              'new_death'='new_deaths'
                                                                              )
                                                                            ), 
                                                      "cdcHosp"=readFromRDS("dfHosp_old")
                                                      ), 
                                     writeLog="./RInputFiles/Coronavirus/Coronavirus_CDC_Daily_v002.log", 
                                     ovrwriteLog=TRUE, 
                                     dfPerCapita=NULL, 
                                     useClusters=readFromRDS("cdc_daily_test_v2")$useClusters, 
                                     skipAssessmentPlots=FALSE, 
                                     brewPalette="Paired"
                                     )
## 
## No file has been downloaded, will use existing file: ./RInputFiles/Coronavirus/CDC_dc_downloaded_210502.csv
## 
## -- Column specification --------------------------------------------------------
## cols(
##   submission_date = col_character(),
##   state = col_character(),
##   tot_cases = col_double(),
##   conf_cases = col_double(),
##   prob_cases = col_double(),
##   new_case = col_double(),
##   pnew_case = col_double(),
##   tot_death = col_double(),
##   conf_death = col_double(),
##   prob_death = col_double(),
##   new_death = col_double(),
##   pnew_death = col_double(),
##   created_at = col_character(),
##   consent_cases = col_character(),
##   consent_deaths = col_character()
## )
## 
## *** File has been checked for uniqueness by: state date

## 
## 
## Checking for similarity of: column names
## In reference but not in current: naconf
## In current but not in reference: 
## 
## Checking for similarity of: date
## In reference but not in current: 0
## In current but not in reference: 18
## Detailed differences available in: ./RInputFiles/Coronavirus/Coronavirus_CDC_Daily_v002.log
## 
## Checking for similarity of: state
## In reference but not in current: 
## In current but not in reference:

## 
## 
## ***Differences of at least 5 and at least 5%
## 
## 97 records
## Detailed output available in log: ./RInputFiles/Coronavirus/Coronavirus_CDC_Daily_v002.log

## 
## 
## ***Differences of at least 0 and at least 0.1%
## 
## 14 records
## Detailed output available in log: ./RInputFiles/Coronavirus/Coronavirus_CDC_Daily_v002.log
## 
## 
## No file has been downloaded, will use existing file: ./RInputFiles/Coronavirus/CDC_h_downloaded_210509.csv
## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_double(),
##   state = col_character(),
##   date = col_date(format = ""),
##   geocoded_state = col_character()
## )
## i Use `spec()` for the full column specifications.

## 
## *** File has been checked for uniqueness by: state date

## 
## 
## Checking for similarity of: column names
## In reference but not in current: 
## In current but not in reference: previous_day_admission_adult_covid_confirmed_18-19 previous_day_admission_adult_covid_confirmed_18-19_coverage previous_day_admission_adult_covid_confirmed_20-29 previous_day_admission_adult_covid_confirmed_20-29_coverage previous_day_admission_adult_covid_confirmed_30-39 previous_day_admission_adult_covid_confirmed_30-39_coverage previous_day_admission_adult_covid_confirmed_40-49 previous_day_admission_adult_covid_confirmed_40-49_coverage previous_day_admission_adult_covid_confirmed_50-59 previous_day_admission_adult_covid_confirmed_50-59_coverage previous_day_admission_adult_covid_confirmed_60-69 previous_day_admission_adult_covid_confirmed_60-69_coverage previous_day_admission_adult_covid_confirmed_70-79 previous_day_admission_adult_covid_confirmed_70-79_coverage previous_day_admission_adult_covid_confirmed_80+ previous_day_admission_adult_covid_confirmed_80+_coverage previous_day_admission_adult_covid_confirmed_unknown previous_day_admission_adult_covid_confirmed_unknown_coverage previous_day_admission_adult_covid_suspected_18-19 previous_day_admission_adult_covid_suspected_18-19_coverage previous_day_admission_adult_covid_suspected_20-29 previous_day_admission_adult_covid_suspected_20-29_coverage previous_day_admission_adult_covid_suspected_30-39 previous_day_admission_adult_covid_suspected_30-39_coverage previous_day_admission_adult_covid_suspected_40-49 previous_day_admission_adult_covid_suspected_40-49_coverage previous_day_admission_adult_covid_suspected_50-59 previous_day_admission_adult_covid_suspected_50-59_coverage previous_day_admission_adult_covid_suspected_60-69 previous_day_admission_adult_covid_suspected_60-69_coverage previous_day_admission_adult_covid_suspected_70-79 previous_day_admission_adult_covid_suspected_70-79_coverage previous_day_admission_adult_covid_suspected_80+ previous_day_admission_adult_covid_suspected_80+_coverage previous_day_admission_adult_covid_suspected_unknown previous_day_admission_adult_covid_suspected_unknown_coverage
## 
## Checking for similarity of: date
## In reference but not in current: 0
## In current but not in reference: 15
## Detailed differences available in: ./RInputFiles/Coronavirus/Coronavirus_CDC_Daily_v002.log
## 
## Checking for similarity of: state
## In reference but not in current: 
## In current but not in reference:

## 
## 
## ***Differences of at least 5 and at least 5%
## 
## 6 records
## Detailed output available in log: ./RInputFiles/Coronavirus/Coronavirus_CDC_Daily_v002.log

## 
## 
## ***Differences of at least 0 and at least 0.1%
## 
## 63 records
## Detailed output available in log: ./RInputFiles/Coronavirus/Coronavirus_CDC_Daily_v002.log
## 
## 
## Column sums before and after applying filtering rules:
## # A tibble: 3 x 6
##   isType tot_cases tot_deaths new_cases   new_deaths         n
##   <chr>      <dbl>      <dbl>     <dbl>        <dbl>     <dbl>
## 1 before   5.08e+9    1.07e+8   3.21e+7 558830       27435    
## 2 after    5.06e+9    1.06e+8   3.19e+7 556355       23715    
## 3 pctchg   4.40e-3    3.81e-3   4.47e-3      0.00443     0.136
## 
## 
## Column sums before and after applying filtering rules:
## # A tibble: 3 x 5
##   isType     inp hosp_adult    hosp_ped          n
##   <chr>    <dbl>      <dbl>       <dbl>      <dbl>
## 1 before 2.57e+7    1.99e+7 436353      23230     
## 2 after  2.56e+7    1.98e+7 426239      22395     
## 3 pctchg 5.60e-3    5.66e-3      0.0232     0.0359
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum unknown in CRS definition

identical(cdc_daily_compare[c("stateData", "dfRaw", "dfProcess", "dfPerCapita", "useClusters")],
          readFromRDS("cdc_daily_test_v3")[c("stateData", "dfRaw", "dfProcess", "dfPerCapita", "useClusters")]
          )
## [1] TRUE
identical(cdc_daily_compare$plotDataList[c("dfFull", "dfAgg", "plotClusters")],
          readFromRDS("cdc_daily_test_v3")$plotDataList[c("dfFull", "dfAgg", "plotClusters")]
          )
## [1] TRUE

The core data elements are identical, and the plots appear to convey the same information. Next steps are to download the latest data and process with existing clusters.

Updated data are downloaded and processed, using existing segments. The downloadTo argument is edited using lapply to avoid downloading data if it has previously been downloaded:

readList <- list("cdcDaily"="./RInputFiles/Coronavirus/CDC_dc_downloaded_210528.csv", 
                 "cdcHosp"="./RInputFiles/Coronavirus/CDC_h_downloaded_210528.csv"
                 )
compareList <- list("cdcDaily"=readFromRDS("cdc_daily_test_v3")$dfRaw$cdcDaily, 
                    "cdcHosp"=readFromRDS("cdc_daily_test_v3")$dfRaw$cdcHosp
                    )

cdc_daily_210528 <- readRunCDCDaily(thruLabel="May 28, 2021", 
                                    downloadTo=lapply(readList, FUN=function(x) if(file.exists(x)) NA else x), 
                                    readFrom=readList,
                                    compareFile=compareList, 
                                    writeLog="./RInputFiles/Coronavirus/Coronavirus_CDC_Daily_210528.log", 
                                    useClusters=readFromRDS("cdc_daily_test_v2")$useClusters, 
                                    skipAssessmentPlots=FALSE, 
                                    brewPalette="Paired"
                                    )
## 
## -- Column specification --------------------------------------------------------
## cols(
##   submission_date = col_character(),
##   state = col_character(),
##   tot_cases = col_double(),
##   conf_cases = col_double(),
##   prob_cases = col_double(),
##   new_case = col_double(),
##   pnew_case = col_double(),
##   tot_death = col_double(),
##   conf_death = col_double(),
##   prob_death = col_double(),
##   new_death = col_double(),
##   pnew_death = col_double(),
##   created_at = col_character(),
##   consent_cases = col_character(),
##   consent_deaths = col_character()
## )
## 
## *** File has been checked for uniqueness by: state date

## 
## 
## Checking for similarity of: column names
## In reference but not in current: 
## In current but not in reference: 
## 
## Checking for similarity of: date
## In reference but not in current: 0
## In current but not in reference: 26
## Detailed differences available in: ./RInputFiles/Coronavirus/Coronavirus_CDC_Daily_210528.log
## 
## Checking for similarity of: state
## In reference but not in current: 
## In current but not in reference:

## 
## 
## ***Differences of at least 5 and at least 5%
## 
## 593 records
## Detailed output available in log: ./RInputFiles/Coronavirus/Coronavirus_CDC_Daily_210528.log

## 
## 
## ***Differences of at least 0 and at least 0.1%
## 
## 39 records
## Detailed output available in log: ./RInputFiles/Coronavirus/Coronavirus_CDC_Daily_210528.log
## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_double(),
##   state = col_character(),
##   date = col_date(format = ""),
##   geocoded_state = col_character()
## )
## i Use `spec()` for the full column specifications.

## 
## *** File has been checked for uniqueness by: state date

## 
## 
## Checking for similarity of: column names
## In reference but not in current: 
## In current but not in reference: 
## 
## Checking for similarity of: date
## In reference but not in current: 0
## In current but not in reference: 14
## Detailed differences available in: ./RInputFiles/Coronavirus/Coronavirus_CDC_Daily_210528.log
## 
## Checking for similarity of: state
## In reference but not in current: 
## In current but not in reference:

## 
## 
## ***Differences of at least 5 and at least 5%
## 
## 3 records
## Detailed output available in log: ./RInputFiles/Coronavirus/Coronavirus_CDC_Daily_210528.log

## 
## 
## ***Differences of at least 0 and at least 0.1%
## 
## 49 records
## Detailed output available in log: ./RInputFiles/Coronavirus/Coronavirus_CDC_Daily_210528.log
## 
## 
## Column sums before and after applying filtering rules:
## # A tibble: 3 x 6
##   isType tot_cases tot_deaths new_cases   new_deaths         n
##   <chr>      <dbl>      <dbl>     <dbl>        <dbl>     <dbl>
## 1 before   5.99e+9    1.24e+8   3.29e+7 577667       28969    
## 2 after    5.96e+9    1.23e+8   3.28e+7 575010       25041    
## 3 pctchg   4.37e-3    3.82e-3   4.55e-3      0.00460     0.136
## 
## 
## Column sums before and after applying filtering rules:
## # A tibble: 3 x 5
##   isType     inp hosp_adult    hosp_ped          n
##   <chr>    <dbl>      <dbl>       <dbl>      <dbl>
## 1 before 2.61e+7    2.03e+7 415621      23972     
## 2 after  2.60e+7    2.02e+7 405188      23109     
## 3 pctchg 5.67e-3    5.73e-3      0.0251     0.0360
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum unknown in CRS definition

saveToRDS(cdc_daily_210528, ovrWrite=FALSE, ovrWriteError=FALSE)

The process appears to work as intended. Next steps are to update the county-level data process, making use of some of the functions available for CDC data processing.

The latest version of the data are downloaded and processed:

readList <- list("cdcDaily"="./RInputFiles/Coronavirus/CDC_dc_downloaded_210708.csv", 
                 "cdcHosp"="./RInputFiles/Coronavirus/CDC_h_downloaded_210708.csv"
                 )
compareList <- list("cdcDaily"=readFromRDS("cdc_daily_210528")$dfRaw$cdcDaily, 
                    "cdcHosp"=readFromRDS("cdc_daily_210528")$dfRaw$cdcHosp
                    )

cdc_daily_210708 <- readRunCDCDaily(thruLabel="Jul 08, 2021", 
                                    downloadTo=lapply(readList, FUN=function(x) if(file.exists(x)) NA else x), 
                                    readFrom=readList,
                                    compareFile=compareList, 
                                    writeLog="./RInputFiles/Coronavirus/Coronavirus_CDC_Daily_210708.log", 
                                    useClusters=readFromRDS("cdc_daily_210528")$useClusters, 
                                    skipAssessmentPlots=FALSE, 
                                    brewPalette="Paired"
                                    )
## 
## -- Column specification --------------------------------------------------------
## cols(
##   submission_date = col_character(),
##   state = col_character(),
##   tot_cases = col_double(),
##   conf_cases = col_double(),
##   prob_cases = col_double(),
##   new_case = col_double(),
##   pnew_case = col_double(),
##   tot_death = col_double(),
##   conf_death = col_double(),
##   prob_death = col_double(),
##   new_death = col_double(),
##   pnew_death = col_double(),
##   created_at = col_character(),
##   consent_cases = col_character(),
##   consent_deaths = col_character()
## )
## 
## *** File has been checked for uniqueness by: state date

## 
## 
## Checking for similarity of: column names
## In reference but not in current: 
## In current but not in reference: 
## 
## Checking for similarity of: date
## In reference but not in current: 0
## In current but not in reference: 40
## Detailed differences available in: ./RInputFiles/Coronavirus/Coronavirus_CDC_Daily_210708.log
## 
## Checking for similarity of: state
## In reference but not in current: 
## In current but not in reference:

## 
## 
## ***Differences of at least 5 and at least 5%
## 
## 432 records
## Detailed output available in log: ./RInputFiles/Coronavirus/Coronavirus_CDC_Daily_210708.log

## 
## 
## ***Differences of at least 0 and at least 0.1%
## 
## 43 records
## Detailed output available in log: ./RInputFiles/Coronavirus/Coronavirus_CDC_Daily_210708.log
## 
## -- Column specification --------------------------------------------------------
## cols(
##   .default = col_double(),
##   state = col_character(),
##   date = col_date(format = ""),
##   geocoded_state = col_logical()
## )
## i Use `spec()` for the full column specifications.

## 
## *** File has been checked for uniqueness by: state date

## 
## 
## Checking for similarity of: column names
## In reference but not in current: 
## In current but not in reference: deaths_covid deaths_covid_coverage
## 
## Checking for similarity of: date
## In reference but not in current: 0
## In current but not in reference: 42
## Detailed differences available in: ./RInputFiles/Coronavirus/Coronavirus_CDC_Daily_210708.log
## 
## Checking for similarity of: state
## In reference but not in current: 
## In current but not in reference:

## 
## 
## ***Differences of at least 5 and at least 5%
## 
## 3 records
## Detailed output available in log: ./RInputFiles/Coronavirus/Coronavirus_CDC_Daily_210708.log

## 
## 
## ***Differences of at least 0 and at least 0.1%
## 
## 57 records
## Detailed output available in log: ./RInputFiles/Coronavirus/Coronavirus_CDC_Daily_210708.log
## 
## 
## Column sums before and after applying filtering rules:
## # A tibble: 3 x 6
##   isType tot_cases tot_deaths new_cases   new_deaths         n
##   <chr>      <dbl>      <dbl>     <dbl>        <dbl>     <dbl>
## 1 before   7.32e+9    1.49e+8   3.35e+7 596979       31329    
## 2 after    7.29e+9    1.48e+8   3.33e+7 594255       27081    
## 3 pctchg   4.40e-3    3.91e-3   4.57e-3      0.00456     0.136
## 
## 
## Column sums before and after applying filtering rules:
## # A tibble: 3 x 5
##   isType     inp hosp_adult    hosp_ped          n
##   <chr>    <dbl>      <dbl>       <dbl>      <dbl>
## 1 before 2.70e+7    2.11e+7 447142      26198     
## 2 after  2.69e+7    2.10e+7 435737      25251     
## 3 pctchg 5.65e-3    5.67e-3      0.0255     0.0361
## Warning in showSRID(uprojargs, format = "PROJ", multiline = "NO", prefer_proj =
## prefer_proj): Discarded datum unknown in CRS definition

saveToRDS(cdc_daily_210708, ovrWrite=FALSE, ovrWriteError=FALSE)